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Abstract 

A new fuzzy vault scheme based on subspace codes is proposed and analyzed. An im- 
portant feature compared to the classical scheme is the possibility of hiding the biometric 
data by means of hash functions, therefore improving security. 

1 Introduction 

Fuzzy vault is the term used by Juels and Sudan in [5] to describe a cryptographic primitive in 
which a key k is hidden by a set of features A in such a way that any witness B which is close 
enough to A under the set difference metric can decommit k. Fuzzy vault is related to the fuzzy 
commitment scheme of Juels and Wattenberg [6], which gives a solution for noisy hashing of data 
for the Hamming distance. A dual version of this scheme was considered by the authors in [TJ [3] . 

The motivation for fuzzy vault is largely predicated on a security concern in the processing of 
biometric data. In early biometric authentication systems, comparison of a biometric was done 
against an image stored locally on the machine, rather than in some hashed form. For security 
purposes, passwords are normally stored in hashed form. Since biometric data is irreplacable in 
the sense that once compromised it cannot be changed, storing the data locally in un-hashed 
form can pose a significant security risk [3J. Biometric data is inherently noisy, however, so direct 
hashing of a user's features would prevent the authentic user from accessing the system unless 
some tolerance is allowed. Using error correcting codes, fuzzy vault can recover a secret key 
hidden by features even in the presence of noise. Recent advancements have been made in the 
pre-alignment of biometrics (cf. [9J and references therein), specifically fingerprints, allowing for 
comparative methods without storage of the image itself. These advancements make fuzzy vault 
a promising and feasible cryptographic solution for noisy data. 

Recently, much work has been done in the area of error correcting codes in projective space. 
These codes turn out to be appropriate for error correction in networks under the setting of 
Kotter and Kschichang, and are referred to as error correcting network codes, projective space 
codes, or subspace codes [7J. The aim of this paper is to show that the construction of the fuzzy 
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vault in [5] can be extended to work for subspace codes in an analogous way with advantages and 
limitations. Namely, we present a construction for a fuzzy vault based on constant dimension 
subspace codes, a class of error correcting codes in projective n-space over a finite field F g . 
Furthermore, we will illustrate the effectiveness of our implementation by using spread codes, 
a particular class of subspace codes, to obtain some security estimates for the vault. Lastly, in 
Section [5] we show how it is possible using this scheme to obscure the attacker from recovering 
the features even if the attacker is able to obtain the key. 

2 Background 

The fuzzy vault scheme proposed in [5] is based on polynomial evaluation and will henceforth be 
called the PFV (polynomial fuzzy vault) scheme. Let A C ¥ q and let n — (ko, ki, fc^-i) G F^ 
be the secret key. We require that |A| = t > I. Furthermore, choose r > t and select a 
set C C F g to consist of r — t points of ¥ q which are not in A. Construct the polynomial 
k(x) = fc + k\x + ...kg^ix e ~ 1 and the sets A, B C ¥ q x ¥ q according to 

A= {(x,k(x)) I x e A}, 
B={(x,y) | x £ C,y ^ k(x)}. 

We will call V = A U B the points of the vault with A the authentic points and B the chaff 
points. Lastly, an appropriate Reed-Solomon decoder decode is selected and the vault is made 
to be (V, r, t, decode). Here, an appropriate Reed-Solomon decoder would be a minimum distance 
decoder capable of correcting both erasures and errors. 

If a witness attempts to gain access to the vault, then the witness submits a set W C ¥ q which 
is close to A under the set difference metric and then constructs the polynomial / by interpolating 
the points of V whose ^-coordinates correspond to W. The witness then uses decode to correct / 
to the nearest codeword in the Reed-Solomon code. Using a minimum distance erasure decoder, 
it is possible to recover the key if 

|(AUC)\W|+2|W\A|<d-l, (1) 

where d is the minimum distance of the Reed-Solomon code. The former set represents the 
erasures and the latter the errors that would occur between the witness and the authentic sets. 

It was shown in [TT| that certain reasonable parameters for the PFV scheme cause the system 
to be susceptible to a brute force attack. Choi et al. in [5] speed up the attack by using a 
fast polynomial reconstruction algorithm. These improvements in attacks show that additional 
security measures should be taken to prevent the loss of a user's features. In Section [5] we 
demonstrate that additional security measures can be taken to prevent a successful attack from 
compromising the features. 

We will present an alternative fuzzy vault scheme in which we will use subspace codes rather 
than Reed-Solomon codes and restrict our attention to constant dimension codes [7]. A constant 
dimension subspace code is a subset of the Grassmanian Q q (n,k), the set of all fc-dimensional 
subspaces of F™. The subspace distance defines a metric on Gq(n, k) given by 

d s (U, V) = dim(f7 + V) - dim(f/ n V), 

for U, V £ Q q (n, k). While finding good subspace codes is still an open research problem, there 
are many candidates now, including the Reed-Solomon-like and spread code constructions [71ll0|. 

Definition 2.1. A subset S C G q (k,n) is called a spread if it satisfies 



2 



• UC\V = for allUy eS, and 



Spreads are known to exist if and only if k \ n. For the remainder of this section, let n = ks. 
From the definition, it is straightforward to see that ds(L(, V) = 2k and |<S| = 9 kZ\ ■ An explicit 
construction of a spread can be found in [10J, and it is this construction we use as the definition 
of a spread code. 

Definition 2.2. Let p G ¥ q [x] be an irreducible monic polynomial of degree k and P G f kxk be 
its companion matrix. Let n = ks for s G N. Then, 

S = {rowsp(A | • • • | A,) | At G F 9 [P]} 

is called a (k, n) -spread code, where rowsp(yl) is the row space of a matrix A. 

In this paper, we present a general construction of a subspace fuzzy vault for constant di- 
mension subspace codes, and provide an example using spread codes. 

3 A Fuzzy Vault Scheme Utilizing Subspace Coding 

We will call this particular implementation of fuzzy vault the SFV (subspace fuzzy vault) scheme. 
Unlike the PFV scheme in which the key is given by the coefficients of a polynomial, the key 
in this scheme is a particular matrix representation of a subspace. To construct the vault, first 
select a constant dimension subspace code C C Q q (k,n). Let k G F^ xn be a matrix over ¥ q of 
full rank whose row space is a codeword of C and let k be the key, where k is the reduced row 
echelon form of n. We will hide the key by a set of features A C ¥ k with |A| = t > k. Choose a 
set of chaff points, CcFj, consisting of r — t points not in A. Similarly as in the PFV case we 
have the authentic and chaff points, given by 

A = {(x,xk) I x G A} 

B = {(x. y) | x G C,y g' rowsp(K)}. 

We set V = A U £>, and the vault is then (V, r, t, decode), where decode is an appropriate 
decoder of the subspace code. 

In order for a witness to decommit k, a set W C F^ is submitted and the second coordinates 
of the elements in the vault whose first coordinates correspond to W are used to generate a 
subspace of F™. This subspace is then decoded to yield a codeword of C. It is possible to decode 
if 

\W\A\ + \A\W\<±(d-l), (3) 

where d is the minimum subspace distance of C and it may be possible to do even better; this 
will be elaborated more in Section 14.31 In contrast to (TTJ) where errors have twice the weight of 
erasures, in subspace codes they have equal weight [7]. 

Algorithm [T] gives a method for producing a vault. The function keygen produces a random 
key, randveck produces a random vector in F*, randvecn produces a random vector in F™, and 
for a subspace V C F™ chaff gen(V) returns a random vector in F™ \ V. 
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Algorithm 1 Create vault 



Require: Finite field ¥ q , A = {xi, ...,Xt} C F^ with t > k, r > t, keygen, randveck, randvecn, 
chaff gen 
k <— keygen 

V<-{} 

for i = 1 — > t do 

V = VU(xi,XiK) 
end for 

while |V| < r do 
v randveck 
while v 6 A do 
v <— randveck 
end while 

w <— chaff gen(rowsp(K)) 

V = VU (v,w) 
end while 



4 Analysis of the SFV 

Let C C Gq(k,n) be a constant dimension subspace code used to hide a key k in a vault V with 
t authentic points and r — t chaff points. We will assume that the set of features {x\, x*} is a 
set of random elements of ¥ q and that the second coordinates of the authentic set {x%K, Xtn} 
contains a set of k linearly independent vectors in F™. This assumption is justified by Lemma 
14.11 a proof of which can be found in [5] as well as Lemma 14.21 Lastly, we assume that the 
attacker has knowledge of the number of authentic points, that is t. 

Lemma 4.1. Let k < 5 < n. The number of 5 x n matrices over F q with rank k is given by 



N q (k t 8,n) = ± '-. (4) 

It is clear that 6 and n play a symmetric role, thus N q (k,6,n) = N q (k,n,5). 

Lemma 4.2. Let x\, Xt be as above and k G F^ xn be a matrix of rank k. The probability that 
contains a set of k linearly independent vectors is given by 

N q (k,k,t) 

Ft ■ v°J 

qKi 

Proof. Since k is a k x n-matrix of rank k 7 it follows from the properties of the rank that 
(x\K, Xtn) T = (xi, Xt) T K has the same rank as (xi, Xt) T ■ Thus, the probability that 
(xiK, Xtn) T is a rank k matrix is equal to the probability that (xi, Xt) T is a rank k matrix, 
which is equal to ([5]) , according to Lemma 14.11 □ 

For reasonable vault parameters, this value is close to 1 and becomes more likely for larger 
t so as to justify our assumptions. Note that it is not necessary that the rank of the matrix be 
equal to k. As long as the rank is not sufficiently small, the error correcting capability of the 
subspace code can still correctly recover the key. 
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Now, the expected number of subsets of size 8 out of r > 5 random points in F™ that span a 
fc-dimensional space can be estimated as 

QN q (k,6,n) 

a q (k, 5, n) = gg „ ■ (6) 

Ideally, an attacker would want to find a So < t so that a q (k, So,n) < 1 in order to have a high 
probability of recovering the key in the event that the So points span a space of dimension k. On 
the other side, to counter this type of attack, one tries to keep k very close to t, so that a q does 
not get small. 

We will approximate the complexity of an offline brute force attack. The attack is similar in 
approach to that proposed in [5] and depends on finding a suitable Sq so that the probability of 
Sq random vectors in F™ spanning a subspace of dimension k is small. 



4.1 Brute Force Attack 

The complexity of the following brute force attack depends on the difficulty of determining the 
rank of an arbitrary k x n matrix over ¥ q . The naive approach requires at most n(k 2 — k) 
operations with n(k 2 — k)/2 in case the field is F2. There exist fast algorithms for determining 
the rank of a matrix but these are asymptotically better and are often much worse for small 
values of k and n. 

It is noted in [11] that the average number of attempts for a user to guess 8 points in the 
authentic set is (Vj /(V) < l.l(r/t) s for r > t > 5. Given that it takes n(S 2 — S)/2 operations to 
row reduce ajxn binary matrix, we obtain the following upper bound for the expected time to 
recover the key. 

Lemma 4.3. Let Sq be so that a2(k,So,n) < 1 from equation ([5]). On average, an attacker can 
recover the secret key in C ■ (r/t) s ° operations, where C < .55 • n(5 2 — So)- 



4.2 Example Using Spread Codes 

As an example of how to construct a vault using subspace codes, we will use spread codes defined 
in Section [5J A (fc, n)-spread code over F g has the advantage of covering the entire space F™, 
making the choice of chaff points simpler. Spread codes are somewhat restrictive in that the 
minimum distance is completely determined by k, unlike other subspace codes where one can 
trade off the distance with other parameters. Nevertheless we illustrate the construction using 
spread codes because of their simplicity. 

Example 1. Let us assume that the features belong to F2 6 , so that k = 16. In this case, we can 
correct a combination of errors and erasures up to 15. Furthermore, suppose that we can reliably 
extract 38 features with 313 chaff points (in keeping with Clancy in We are free to choose 

n as long as it is a positive integer multiple of k. Choosing n — 96 so that we have roughly 2 80 
keys we can then compute from the formula So = 18. From Lemma \4-3\ the secret key can 
be recovered in 2 68 7 operations. 

It is shown that an (n, k) q spread code can be decoded in 0((n — k)k 5 ) operations. For more 
information on spread codes and an efficient decoding algorithm, the reader is referred to |10| . 



4.3 Decoding 

As was mentioned in Section [3J erasures and errors have equal weight and the subspace can 
be recovered if the sets A and W satisfy ([3]), although it is possible to correct more. For any 
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subspace U and a subspace V G C, we can express U = (U fl V) ® E for some subspace E. If the 
dimension of the constant dimension subspace code is k, then k — dim(J7 fl V) corresponds to the 
deletion of dimension (erasures or deletions) and dim(.E') corresponds to an increase of dimension 
due to error (errors or insertions) . Kotter and Kschischang show in [7] that a minimum distance 
decoder for a subspace code can recover V from U if 

2(dim(-E l ) + k — dim(J7 n V)) <d — l, (7) 

where d is the minimum distance of the subspace code. 

For a set S C Fj, we will denote by (S) the subspace spanned by the elements {sk | s 6 S}. 
It is clear that dim(S) < Furthermore, we can state the following lemma. 

Lemma 4.4. Let A, W C ¥ k q . Then, 

dim{{A) n (W)) > dim((A n W)). (8) 
Proof. It follows from {A) n (W) 2 (in W) □ 

Theorem 4.1. Let A denote the authentic set and W the witness set. It is possible to recover 
the key using a minimum distance decoder if 

\W\A\ + \A\W\<±(d-l), (9) 

where d is the minimum distance of the subspace code. 
Proof. We have 

\W\A\ + \A\W\> dxm((W \ A}) + dim((A \ W)) 

> dim((W \A)) + k- dim((A n W)) 

> dim((W \A)) + k~ dim((A) n (W)). 

The result now follows from ((TJ). □ 

It is possible that not all vectors in the authentic set are linearly independent and the loss 
of a vector will not decrease the dimension of the subspace generated by the authentic set and 
alternatively an addition of a vector may not increase its dimension. 



5 Considerations 

One of the disadvantages of using a biometric for security is that once an attacker knows a user's 
features, the user can never use a biometric scheme based on those features again. It is of interest 
to be able to regenerate a vault, in case by some means an attacker is able to recover the key. 
However, in both fuzzy vault schemes presented in this paper, finding the key is equivalent to 
finding the features as they are immediately retrievable as the first coordinate of the points in the 
authentic set. Ideally, the user should have the features obscured in such a way that obtaining 
the key is not equivalent to obtaining the features. For instance, one might want to store in the 
vault a hash of the features, instead of the features themselves. However, in the PFV scheme 
this makes decoding impractical as it relies on interpolation of the polynomial for which it is 
necessary to know the a;- values. This concern was noted by Juels and Sudan in [5]. On the other 
hand, the decoding of a subspace code relies only on the second coordinate in the vault, therefore 
we can modify the SFV scheme slighty by setting the set A and B in ^ to be 
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A={{h{x),xK)\xeA} 

B={(h{x),y)\xeC,y^rowsp(K)}, { ' 

for a suitable hash function h. This is a principal advantage of the SFV implementation, as 
an attacker who is capable of obtaining k must still recover xi,...,x t from and k. 

This amounts to the attacker determining the invertible matrix T <E GLfc(F g ) for which k = Tk. 
a computationally difficult problem. Thus, regenerating the vault would be possible if h is 
significantly resistant to brute force attack. For this, we require that the features would have a 
significant amount of entropy (<~ 80 bits). This is out of the range of current fingerprint systems, 
however, may be considered for other authentication schemes, such as iris recognition, facial 
recognition, or future developments. 

There is also another important reason to use hashes as above in the system. In fact, suppose 
that an attacker finds an element in the unhashed version of the vault whose first coordinate is 
a linear combination of other first coordinates of other elements in the vault. Then he can check 
whether its second coordinate is also a linear combination (with the same coefficients) of the 
corresponding second coordinates of the other elements. If this happens he can argue that the 
element belongs to A. Clearly also this attack can be prevented by taking t close to k, besides 
using an hash function to hide the first coordinates. 
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